Active Learning with Boosting for Spam Detection

نویسنده

  • Enela Pema
چکیده

Spam detection algorithms have been developed to train in a large enough set of labeled data and predict with a high accuracy of 95% if an email is spam or not. A problem that arises in this setting is that labeling examples is a costly process. It requires humans to read them one by one and classify them. Active learning is a learning approach developed to address this problem. It learns a small set of labeled examples and implements strategies to improve learning by labeling additional examples. Its goal is to provide a good classifier with as small as possible a number of examples. Boosting algorithms intuitively fit well into the active learning framework since they are good at distinguishing significant examples. In this report, I study performance and behavior of active learning in combination with four boosting algorithms: AdaBoost, AdaBoost*,LPBoost, ERLPBoost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

Experiments on Spam Detection with Boosting, Svm and Naive Bayes

For this project, I implement 3 popular text classification algorithms on spam detection, namely AdaBoost, Support Vector Machines and Naive Bayes. The performance are evaluated on some testing datasets. All experiments are done in Matlab. The experimental result is, all 3 algorithms have a satisfactory performance on spam detection. In term of accuracy, Adaboost has the best error bound. On th...

متن کامل

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009